SUPPORT VECTOR MACHINES¶
Linearly Seperable Data¶
Mathematical Intuition Behind Support Vector Machines (SVM)¶
Support Vector Machines (SVM) are supervised learning models used for classification and regression. SVMs are particularly well-suited for classification tasks and aim to find the optimal hyperplane that separates different classes in the feature space.
Key Concepts¶
Hyperplane:
- In an (n)-dimensional space, a hyperplane is an (n-1)-dimensional subspace that separates the space into two half-spaces.
- For a 2D space, the hyperplane is a line; for a 3D space, it is a plane.
Margin:
- The margin is the distance between the hyperplane and the nearest data points from each class. These points are called support vectors.
- SVM aims to maximize this margin, making the classifier more robust.
Support Vectors:
- These are the data points that are closest to the hyperplane and influence its position and orientation.
- They lie on the edge of the margin.
Optimal Hyperplane:
- The optimal hyperplane is the one that maximizes the margin between the two classes.
Formulation¶
Given a dataset¶
{(𝑥𝑖,𝑦𝑖)}i=1 to m, where 𝑥𝑖 ∈ 𝑅^𝑛 are the feature vectors and yi ∈ {−1,1} are the class labels, the goal is to find the hyperplane that best separates the classes. The hyperplane can be represented as:¶
𝑤⋅𝑥𝑖−𝑏=0¶
where¶
w is the normal vector to the hyperplane and b is the bias term.For any point xi, the decision rule is:¶
𝑤⋅𝑥𝑖−𝑏≥1 if 𝑦𝑖=1¶
𝑤⋅𝑥𝑖−𝑏≤−1 if 𝑦𝑖=−1¶
This can be combined into one constraint:¶
𝑦𝑖(𝑤⋅𝑥𝑖−𝑏)≥1¶
Dual Problem¶
The primal problem can be converted to the dual problem by maximizing the Lagrangian with respect to w and b
Solution¶
Once the 𝛼𝑖 are found by solving the dual problem, the weight vector 𝑤 and bias 𝑏 can be computed as:
for any support vector xk.
Predictions¶
Given a new data point xk, the decision function is:
f(x)=w⋅x−b
The predicted class label is:
yhat = sign(f(x))
For non-linearly separable data, we introduce slack variables 𝜉𝑖 to allow some misclassifications, leading to the soft margin SVM. The optimization problem becomes: